Revert "cmd/openshift-install/create: Allow hung watch" #741

wking · 2018-11-27T20:35:45Z

This reverts commit 6dc1bf6, #615.

109531f (#606) made the watch re-connects reliable, so make watch timeouts fatal again. This avoids confusing users by showing "Install complete!" messages when they may actually have a hung bootstrap process.

This reverts commit 6dc1bf6, openshift#615. 109531f (cmd/openshift-install/create: Retry watch connections, 2018-11-02, openshift#606) made the watch re-connects reliable, so make watch timeouts fatal again. This avoids confusing users by showing "Install complete!" messages when they may actually have a hung bootstrap process.

abhinavdahiya · 2018-11-27T21:55:23Z

cmd/openshift-install/create.go

 	if err != nil {
-		logrus.Error(errors.Wrap(err, "waiting for bootstrap-complete"))
-		return nil
+		return errors.Wrap(err, "waiting for bootstrap-complete")


to compare it to how the error would have been structured without wrap
fmt.Errorf("failed waiting for bootstrap-complete: %v", err)

I would suggest we keep it errors.Wrap(err, "failed waiting for bootstrap-complete")

$ git describe --dirty v0.4.0-25-gb13b7fa-dirty $ git diff diff --git a/cmd/openshift-install/create.go b/cmd/openshift-install/create.go index 5ba6517..8c95b8c 100644 --- a/cmd/openshift-install/create.go +++ b/cmd/openshift-install/create.go @@ -203,8 +203,8 @@ func destroyBootstrap(ctx context.Context, directory string) (err error) { watcher, err := events.Watch(metav1.ListOptions{ ResourceVersion: sinceResourceVersion, }) - if err == nil { - return watcher, nil + if err == nil || true { + return watcher, err } select { case <-eventContext.Done(): $ TAGS=libvirt_destroy hack/build.sh $ openshift-install --dir=wking create cluster INFO Fetching OS image... INFO Using Terraform to create cluster... INFO Waiting for bootstrap completion... INFO API v1.11.0+d4cacc0 up WARNING RetryWatcher - getting event failed! Re-creating the watcher. Last RV: 2214 FATAL Error executing openshift-install: waiting for bootstrap-complete: watch closed before UntilWithoutRetry timeout

That line includes both "FATAL" and "Error" already. Personally I think that's enough without also including an additional "failed", but I can add it if you like.

as error returns from this function first one is more clear when looking at the function return (not few levels up).

wking · 2018-11-27T22:39:58Z

VPC exhaustion:

level=error msg="Error: Error applying plan:\n\n1 error(s) occurred:\n\n* module.vpc.aws_vpc.new_vpc: 1 error(s) occurred:\n\n* aws_vpc.new_vpc: Error creating VPC: VpcLimitExceeded: The maximum number of VPCs has been reached.\n\tstatus code: 400, request id: d1fea395

I've since cleaned up the leaked CI VPCs, and #740 is in flight to address the current leak.

/retest

wking · 2018-11-27T23:20:01Z

e2e-aws:

level=error msg="Error: Error applying plan:\n\n1 error(s) occurred:\n\n* aws_route53_record.etcd_a_nodes[2]: 1 error(s) occurred:\n\n* aws_route53_record.etcd_a_nodes.2: timeout while waiting for state to become 'INSYNC' (timeout: 30m0s)\n\nTerraform does not automatically rollback in the face of errors.\nInstead, your Terraform state file has been partially updated with\nany resources that successfully completed. Please address the error\nabove and apply again to incrementally change your infrastructure."
level=fatal msg="Error executing openshift-install: failed to fetch Cluster: failed to generate asset \"Cluster\": failed to run terraform: failed to execute Terraform: exit status 1"

I'll wait until CI settles down and kick this again tonight.

abhinavdahiya · 2018-11-28T01:13:25Z

don't care enough about #741 (comment) (just preferred the recommendation)

/lgtm

openshift-ci-robot · 2018-11-28T01:13:34Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [abhinavdahiya,wking]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-bot · 2018-11-28T01:23:01Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Nov 27, 2018

openshift-ci-robot requested review from aaronlevy and crawford November 27, 2018 20:35

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 27, 2018

wking mentioned this pull request Nov 27, 2018

clarify logComplete msgs and return err if wait for bootstrap complete exceeds timeout #727

Closed

abhinavdahiya reviewed Nov 27, 2018

View reviewed changes

openshift-ci-robot assigned abhinavdahiya Nov 28, 2018

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 28, 2018

openshift-merge-robot merged commit 6971ab6 into openshift:master Nov 28, 2018

wking deleted the fatal-wait-failure branch November 28, 2018 05:20

abhinavdahiya mentioned this pull request Nov 28, 2018

Installer Reporting Success Despite Errors #750

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Revert "cmd/openshift-install/create: Allow hung watch" #741

Revert "cmd/openshift-install/create: Allow hung watch" #741

Uh oh!

wking commented Nov 27, 2018

Uh oh!

abhinavdahiya Nov 27, 2018

Uh oh!

wking Nov 28, 2018

Uh oh!

abhinavdahiya Nov 28, 2018 •

edited

Loading

Uh oh!

wking commented Nov 27, 2018 •

edited

Loading

Uh oh!

wking commented Nov 27, 2018

Uh oh!

abhinavdahiya commented Nov 28, 2018

Uh oh!

openshift-ci-robot commented Nov 28, 2018

Uh oh!

openshift-bot commented Nov 28, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Revert "cmd/openshift-install/create: Allow hung watch" #741

Revert "cmd/openshift-install/create: Allow hung watch" #741

Uh oh!

Conversation

wking commented Nov 27, 2018

Uh oh!

abhinavdahiya Nov 27, 2018

Choose a reason for hiding this comment

Uh oh!

wking Nov 28, 2018

Choose a reason for hiding this comment

Uh oh!

abhinavdahiya Nov 28, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wking commented Nov 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wking commented Nov 27, 2018

Uh oh!

abhinavdahiya commented Nov 28, 2018

Uh oh!

openshift-ci-robot commented Nov 28, 2018

Uh oh!

openshift-bot commented Nov 28, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

abhinavdahiya Nov 28, 2018 •

edited

Loading

wking commented Nov 27, 2018 •

edited

Loading